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Abstract 

We examine the hnear convergence rates of variants of the proxi- 
mal point method for finding zeros of maximal monotone operators. 
We begin by showing how metric subregularity is sufficient for linear 
convergence to a zero of a maximal monotone operator. This result is 
then generalized to obtain convergence rates for the problem of find- 
ing a common zero of multiple monotone operators by considering 
randomized and averaged proximal methods. 

1 Introduction 

Let 7i he a real Hilbert space and let T : 7i =^ 7i be a set- valued mapping. 
Two common problems that arise in several branches of applied mathematics 
are to 

Find xen such that e T{x) (1.1) 

and, more generally 

Find xen such that G Di^Tiix), (1.2) 
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where / is some index set. Specifically, these problems correspond to finding a 

zero of an operator and, more generally, a common zero of multiple operators. 

Suppose that the operators under consideration are monotone, meaning 
that 

{xi - xo, yi - yo) > for all xq, xi en,yoe T{xo),yi e T{xi). 

For A > 0, the mappings J^t '■= + XT)~^ are the resolvents of T, which 
were shown to be at most single- valued in [26]. One proposed method for 
solving Problem 1.1 is the proximal point algorithm, considered originally in 
[25] and more thoroughly explored by [31], given by, for /c = 0, 1, 2, ... , 

Xk+l = J\T{Xk)- (1-3) 

Our goal is to examine how appropriate regularity assumptions on the 
operators T (or Ti, . . . ,Tm, respectively) affect the speed of convergence of 
variants of the proximal point algorithm. In order to do so, the remainder of 
this paper is organized as follows. In Section 2, we provide notation and basic 
facts about monotone operators, metric regularity and subregularity, and the 
geometry of convex sets. Then, in Section 3, we show how assumptions of 
metric subregularity can be used to demonstrate linear convergence of both 
the proximal point algorithm for Problem 1.1 and a randomized proximal 
point algorithm for Problem 1.2. 

2 Background and Notation 

A single- valued operator U is firmly non- expansive if 
\\U{x)-U{y)r + \\{I-U){x)-{I-U){y)r<\\x-yr ^x,y (2.1) 

It was shown in [31, 13] that an operator T is monotone if and only if its 
resolvents are firmly non-expansive. The domain of T is G 7i : T{x) ^ 0} 
and the inverse operator, T~^, is defined by T~^{y) = {x : y e T{x)}. It 
is known that (see [32], for example) T is monotone if and only if is 
monotone and, if T is maximal monotone, meaning the graph of T is not 
strictly contained in the graph of another monotone operator, then both T 
and are closed and convex- valued and the domain of the resolvents of T 

is n. 
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We are interested in how certain regularity conditions affect local rates 
of convergence. One prominent condition is the idea of metric regularity of 
set-valued mappings. We say the set-valued mapping $ is metrically regular 
at X for b G ^{x) if there exists 7 > such that 

cl{x, ^~\b)) < 7 cl{b, ^x)) for all {x, b) near (x, b). (2.2) 

Further, the modulus of regularity is the infimum of all constants 7 such that 
Inequality 2.2 holds. 

A slightly weaker condition is that of metric subregularity. We say the 
set- valued mapping $ is metrically subregular at x for 6 e $(a;) if there exists 
7 > such that 

d{x, $'"^(6)) < 7 d{b, $(x)) for all x near x. (2.3) 

Further, the modulus of subregularity is the infimum of all constants 7 such 
that Equation 2.3 holds. Note that for metric subregularity, the reference 
vector b is fixed in Inequality 2.3 but not in Inequality 2.2. It is clear from 
the definitions that metric regularity implies metric subregularity; hence, the 
modulus of subregularity is no larger than the modulus of regularity, using 
the convention that the modulus of (sub)regularity is infinite if the mapping 
fails to be metrically (sub)regular. 

The property of metric regularity is connected with other ideas in vari- 
ational analysis. The simplest connection, as shown in [11, Ex. 1.1], is 
that metric regularity generalizes the Banach open mapping principle, es- 
sentially saying that a bounded and linear mapping is metrically regular if 
and only if it is surjective; in such a case, the modulus of regularity is sim- 
ply supy^^{d{0,A~^{y))} where B is the unit ball. If the mapping $ has a 
closed-convex graph, the Robinson-Ursescu Theorem says that $ is metri- 
cally regular at x for y if and only if y is in the interior of the range of 
Metric regularity is also known to be equivalent to several others in varia- 
tional analysis, namely the Aubin property of and the openness at linear 
rate of Additionally, metric regularity has been shown to be a general- 
ization of the Eckart-Young result from matrix analysis on the distance to 
singularity of a matrix. Further, a result originating with Lyusternik and 
Graves ([24], [14]) and extended by others (for example, [10], [16], [11]) show 
that metric regularity is determined by the first-order behavior of a mapping 
and is preserved by sufficiently small first-order perturbations. Additional 
information about metric regularity and its relationship to other concepts in 
variational analysis can be found in [12], [16], and [11], among others. 
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A central tool frequently appearing in variational analysis is that of the 
normal cone of a closed, convex set, S. Specifically, the normal cone of S at 
X & S can be defined as 

Ns{x) := {x* eH: {x*, s - x) < E S} (2.4) 

and Ns{x) — $ ii x ^ S. Let d{x, S) denote the distance from x to S, given 
by d{x,S) := inf^g^ \\x — s\\. Further, let Ps{x) be the projection operator 
onto S, i.e., the set of such minimizers. If S is closed, convex and non- 
empty, then Ps is single-valued everywhere. Further, the projection operator 
is firmly non-expansive ([9, Thm 5.5]) and can be characterized by 

z = Ps{x) ^ z e S and x - z e Ns{z). (2.5) 

A method of characterizing regularity of closed sets Si,...,Sm is by 
considering regularity properties of a related set-valued mapping. Given 
a Hilbert space, H, consider the product space Ti.^ with the induced inner 
product defined by 

rn 

{{Xi, X2,..., Xm), (yi, y2, ■ ■ ■ , Vm)) = ^{^i, Vi) 

i=l 

and consider the set-valued mapping given by $(a;) = [Si — x, . . . , Sm — xY . 
Note that e $(a;) if and only if x e flj-S'i. Using metric regularity as 
a starting point, suppose (^{x) is metrically regular at x for 0. From the 

definition, metric regularity of $ at a; for is equivalent to the strong metric 
inequality^ examined in [18] and [19], among others, defined by the existence 
of /3, 5 > such that, for i = 1, . . . , m, 

d{x,r[i{Si—Zi)) < P max d{x-\-Zi,Si) for all x e x-\-5B, Zi e 5B. (2.6) 

l<i<m 

Characterizing this in terms of normal cones, it was shown in [19, Thm. 1, 
Prop. 10, Cor. 2] that this is equivalent to the existence of a constant A; > 
such that 

Zie5B, yieNs,{x + Zi) (i = l,...,m)^^||y,||2<A;l$^y,||^ (2.7) 

i i 

By using the formula in [32, Thm 9.43] for expressing the modulus of reg- 
ularity in terms of coderivatives, it was shown in [22] that the modulus of 
regularity of $ at x for equals 



lim I inf {A; : Inequality 2.7 holds.} |, 
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with this value being infinite being equivalent to a lack of metric regularity 
of $. 

Consider a relaxed variant of the strong metric inequality, known simply 
as the metric inequality as studied in [16], [27] and [19] among others, defined 
to hold at X if there exists /3 > such that 

d{x, HiSi) < (3 max d{x, S^) for all x e x + 5B. (2.8) 

l<i<m 

If Inequality 2.8 is valid for 6 = oo, we obtain the property of linear regularity 
and if it holds for all 5 > 0, it is equivalent to the property of bounded linear 
regularity, as studied in [3], [4], [5], [6], [7] and others, often in an algorithmic 
context. It is easy to show that the existence of a 5 > such that Inequality 
2.8 holds is equivalent to the previously defined mapping $ being metrically 
subregular at a; for 0. 

Our focus for the remainder of this paper will involve metric subregularity. 
Unfortunately, several of the stability properties and some of the geometric 
intuition that accompanies metric regularity — especially that relating to nor- 
mal cones of sets — fails to have a natural equivalent for metric subregularity; 
some examples of this phenomenon are given in [12]. However, since metric 
regularity implies metric subregularity, the intuition provided by metric reg- 
ularity can be applied to the following results when that property does, in 
fact, hold. Additionally, if the monotone operators under consideration are 
actually subdifferentials of convex functions, characterization of both metric 
regularity and subregularity in terms of the underlying function was shown 
in [2], providing additional intuition. 

3 Metric Regularity and Linear Convergence 

We now return to Problem 1.1, the problem of finding a zero of a maximal 
monotone operator. Variants of proximal point algorithms for solving this 
and related problems have been considered by a wide variety of authors, 
including [31], [23], [33], [28], [1] and others. 

Many authors consider an algorithmic framework much more general than 
the one considered in this paper. Some of the better-studied variants allow 
for a varying proximal parameter A, allow approximate computation of the 
proximal iteration, allow over- or under-relaxation in the proximal step or in- 
corporate an additional projective framework. These ideas have often proven 
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worthwhile both for designing a computationally practical and efficient al- 
gorithm as well as for improving the convergence analysis. However, in this 
paper, we will only consider algorithms in their "classical" form, assuming 
exact computation of the resolvent with a fixed proximal parameter. Our 
particular interest is in exploring how naturally occurring constants — for ex- 
ample, the modulus of subregularity of the mappings themselves and of the 
mapping associated with the solution sets — govern the local rate of conver- 
gence and, further, how randomization as an analytical tool can emphasize 
this connection. To begin, consider the basic proximal point algorithm given 
by 1.3, where Xk+i — JxT^Xk)- Under an assumption of metric subregularity, 
we obtain the following initial result. 

Theorem 3.1 Suppose T is maximal monotone and metrically suhregular 
at X & T~^(0) for with regularity modulus 7. Let 7 > 7 and suppose Xq 
is sufficiently near x. Then the iterates given by Algorithm 1.3 are linearly 
convergent to T'"^(0), the zero-set ofT, satisfying 

d{x,^„T-\Q)f < -t^d{x,,T-\Q)y. 

+ 7"^ 

Proof Let x e T~^(0) and note that J\t{x) = x. Since the resolvent of 
a monotone operator is firmly non-expansive, it follows from Inequality 2.1 
that, for any x, 

\\jxt{x) - JMm' < Ik - -w- Jxt){x) -{I- JxT){m\ 

implying that 

\\Jxt{x) - xf < \\x - xf - \\x - Jxt{x)\\\ (3.2) 
However, by definition of Jxr-i 

x-J^T{x)eXT{JxT{x)). 

In particular, 

\\x - JxTix)\\ > Amin{||;2|| : z e T(JAr(a;))} = A d{0,T{JxT{x))). (3.3) 

Now, note that since the resolvents and projection operators are firmly non- 
expansive, if Xq has the property of being sufficiently close to x such that 
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Inequality 2.3 holds with constant 7, then Xj and Pr-i(o)(^j) do as well for 
each j > 0. Therefore, it follows that 

< \\Xk+l - PT-i{0){XkW 

< \\xk - PT--^{o){xkW - \\xk - Jxrixk^"^ (Inequality 3.2) 

< d{xk, T-\0)f - A^d(0, T{J^T{xkW (Inequality 3.3) 

< d{xk, T-\0)f - ^d{JxT{xk),T-\0)f (Inequality 2.3) 
= d{x,,T-\0)r -^d{x,+,,T-\0)f. 



7^ 



This implies that 



(1 + pdixk+i, T-\0))' < d{xk, T-\0))', 
from which the result follows. □ 



Further observe that by considering a sequence {Afe} such that — > 00 
instead of a fixed A in the above algorithm, we obtain superlinear conver- 
gence. 

Our primary interest in Theorem 3.1 is as a tool in proving the following 
result. Theorem 3.5. However, we note that Theorem 3.1 is similar to some 
previously known results. For example, linear convergence was shown in [31] 
and [33], under a framework that permitted error in evaluating the resolvent, 
with a slightly stronger regularity assumption. In particular, as a limiting 
case with no such error in evaluating the resolvent, an identical convergence 
rate was obtained in [31]. The result by Solodov and Svaiter in [33], however, 
corresponds to a hybrid proximal-projection algorithm. 

We wish to generalize this result to that of Problem 1.2, finding a common 
zero among a group of maximal monotone operators, Ti, . . . ,Tm- Variants of 
proximal point algorithms for this problem have been considered by a variety 
of authors, including [17], [20], [8], [15], among others. In what follows, 
consider the following randomized variant of a proximal point algorithm: for 
A; = 0,l,2,..., 

Xk+i = JxT^Xk) with probability — , i = 1, ... ,171. (3.4) 
Then we obtain the following result. 
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Theorem 3.5 Suppose the following assumptions hold: 

1. The maximal monotone operators T^, i = 1, . . . ,m, are metrically sub- 
regular at X & r]jT^^{0) for with modulus 7j. 

2. The mapping = [Tj~^(0) — a;, . . . ,T~^(0) — xY is metrically sub- 
regular at X for with modulus k. 

3. J > max{7i, . . . , 7,^} and R > k. 

4. > 372. 

Then for Xq sufficiently close to x, Algorithm 3.4 satisfies 
d{xk+i,njTr\0)) < d{xk,r)jT-\0)) 

and 

nd{xk+unjT-\0)Y I Xk] < {1- ^ + —^{-r^^^f)d{x,,njTr\0)f. 

Proof If xq is sufficiently close to x such that Inequality 2.3 holds with 
constant 7 for each mapping Tj, it follows from the firm non-expansivity of the 
resolvents and the projection operator that each iterate Xk and the projection 
of each iterate onto the common zero set, Pr\jT~^{o){^k)-i are sufficiently close 
to X as well. Additionally, this implies the first conclusion of the theorem. 

Suppose that at iteration the resolvent J^r^ is chosen by the algorithm. 
Then it follows that 

d{JxTAxk),n,Tr\Q)f 

= \\J\TAxk) - ^n,T-i(o)('^AT.(a;fc))|P 



< ||-^AT,(a;fc) - Pn,T-^{i)){^k)V (Definition of Projection) 

< (i(xfe,n,rri(0))2 - \\xk - JxTA^k)f (Inequality 3.2) 

= d{xk, f\jTf^{Q)f - \xk - PT-\Q){xk)\ + \PTr\Q){xk) - J\Ti{xk) 

< d{xk,njT-\0)f - d{xk,T-\0)f - \\PT-\o){^k) - JxrA^kW 

— 2(Xfe — P^-i(-Q)(Xfe), Pj^-i^Q)(Xfe) — JxTi{Xk))- 
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Note that 

— 2(a;fe — Pj^-i(Q-)(xfc), P^-i(Q)(xfc) — JxxX^k)) 

= 2{Xk - PT-\Q){Xk), [JxT^Xk) - PT-\Q){J\TXxk))) 

< 2{xk - P^-i(o)(a;fc), JxTiM - PT-\o){J\TXxk))) 



< 2 



Xk - PTr\o)i^k) JxTiixk) - PTrUo)iJ>^Ti{xk)) 



{0)\ 



= 2 d{xk,T-\0)) d(JxTA^k),Tr\0)) 



< 2 



r 



A2 + f 



,yd{xk,Tr\0)f. 



The first inequaUty comes from the fact that 
Xk — Pj.-i^Q^{xk) G Nrp-i^Q^(Pj,-i^QS^(xk)) so Inequahty 2.4 can be apphed from 
the definition of the normal cone. The second inequahty is an apphcation of 
the Cauchy-Schwartz inequality. The rest follows from the definition of the 
projection operator, followed by applying Theorem 3.1, the previous linear 
convergence result. Putting this together, we obtain 



d{JxT,M,r\jTr\0)f < d{xk,njTr\0)f-{l-2( 

Taking the expected value, we obtain 
E[d{xk+i,njTr\0)f I Xk] 



A2 + f 



-)y{xk.Tr\Q)f. 



< d{xk,njT7\o)y 
= d{xk,r\jT-\o))'- 



m 
1 



1 - 2 



r 



A2 + 72 



m 



1 - 2 



1' 



1 2 / 72 



+ 



A2 + f 



j:d{xk,Tr\0)y 

i=l 

y)d{o,^xk)r 



mtv ^ A2 + 72 



where the last inequality follows from the metric subregularity of the mapping 
^{x) = [T{\Q)-x,...,T-\Q)-xY. □ 

Note that the last assumption in Theorem 3.5 is so that the convergence 
rate is less than 1. Additionally, this type of convergence result implies that 
d{xk,r\jT^^{Q)) almost surely (cf. [21]). 
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One particularly simple way of de-randomizing Algorithm 3.4 is by consid- 
ering averaged resolvents or, in the terminology of [20] , the bary centric proxi- 
mal method. Specifically, given maximal monotone operators Tj, i = 1, . . . ,m 
with respective resolvents JxXi, i = l,...,m, consider the algorithm de- 
scribed such that, for /c = 0, 1, 2, ... , 

Xk+1 = — XI -^^^i (^k) (3.6) 
i=i 

and the associated fixed-point problem 

Find X such that x — — Jxr ix). (3.7) 

The following proposition, found in [20], provides the necessary connection. 

Proposition 3.8 ([20]) If x e niT-^{Q), then X is a solution to Problem 
3.7. Further, if niT'^O) 7^ 0, the fixed points of Problem 3.7 are common 
fixed points of all the % 's. 

Considering the example where each operator Tj is the normal cone map- 
ping for some closed, convex set, it follows that Algorithm 3.6 is simply the 
averaged projections algorithm studied hj [29], [30], [3], [22], and [21], among 
others. More generally, we can use the result of Theorem 3.5 to generalize 
a result on averaged projections found in [21, Thm 5.8] to the bary centric 
proximal method. 

Theorem 3.9 Suppose the assumptions of Theorem 3.5 hold. Then the con- 
clusions of Theorem 3.5 hold for Algorithm 3.6 as well. 

Proof Let x^ be the current iterate, x^^i be the new iterate in the barycen- 
tric proximal method. Algorithm 3.6, and let x^^^ be the new iterate in the 
randomized proximal point method. Algorithm 3.4. First, note that since 
each set Tr^{Q) is convex, the distance function d{ ■ , DjTj ^(0)) is as well, 
and 

d{JxT,{xk),r)jT-\0)) < d{xk, n,T-i(0)) for i = 1, . . . , m, 
from which it follows that 

d{x^^„njTr\Q)) < dix,,njT-'{Q)). 
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Let a — (^1— ^+^(^ ^2'^^-2 j " j and observe that the function d{ ■ , n^T^- ^(0))^ 
is also convex. Noting that 

i=i 

it follows that 

< E[d«„n,7;-^(0))^|x,] 

< Q;d(xfc,n,■T^l(0))^ 

from an application of Jensen's Inequality. □ 

In particular, the barycentric proximal method converges at least as 
quickly as the randomized proximal point method. 
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